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Bottom-Up 

Parsing 



Bottom-up parsing is more general than top-down parsing 
And just as efficient 
Builds on ideas in top-down parsing 

Bottom-up is the preferred method, that is used in most of 
the parser generated tools. 
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Bottom-up parsers don't need left-factored grammars 



An 

Introductory 

Example 



Revert to the "natural" grammar for our example: 

E^T + E|T 
T — > int ^ T I int I (E) 



Considerthe string: int * int + int 
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The Idea 



Bottom-up parsing reduces a string to the start symbol by 

inverting productions: 



int 


* int + int 


T 


-> 


int 


int 


* T + int 


T 


-» 


int * 


T + 


int 


T 


-» 


int 


T + 


T 


E 


-» 


T 


T + 


E 


E 


-> 


T + E 



E 
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Observation 



Read the productions in reverse (from bottom to top) 
Trace a rightmost derivation! 



int 


* int + int 


T 


-> 


int 


int 


* T + int 


T 


-» 


int * 


T + 


int 


T 


-> 


int 


T + 


T 


E 


-> 


T 


T + 


E 


E 


-> 


T + E 
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Important 
Fact #1 



Important Fact #1 about bottom-up parsing: 



A bottom-up parser traces a rightmost derivation in reverse 
By using reductions instead of productions 
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A Bottom 
Parse 



int * int + int 
int * T + int 




Parse tree constructed from the given sequence of reductions 
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int * int + int 



A Bottom-up 
Parse in 
Detail (1) 



int 



int + int 
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int * int + int 
int * T + int 



A Bottom-up 
Parse in 
Detail (2) 



int 



int + int 
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int * int + int 
int * T + int 
T + int 



A Bottom-up 
Parse in 
Detail (3) 



int 



T 

T 

* int + 




Dr. Sherin ElGokhy 



int * int + int 
int * T + int 
T + int 

A Bottom-up 
Parse in 
Detail (4) 

int 



T 

int + 



T 

int 
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int * int + int 
int * T + int 
T + int 

A Bottom-up 
Parse in 
Detail (5) 



int 



T 

int + 



T 

int 
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int * int + int 



int * T + int 
T + int 



A Bottom-up 
Parse in 



T + T 
T+ E 



Detail (6) 



int 



T 




T 



+ 



int 
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Quiz 



For the given grammar, what is the correct series of 
reductions for the string: -(id + id) + id 



-(id + id) + id 
-(id + E'j + id 
-(id + E) + id 
n -(E' + E) + id 
A -(E) + id 
-E' + id 
E' + id 
E' + E' 

E' + E 
E 

CM 



-(id + id) + id 
-(E' + id) + id 
-(E' + E'j + id 
-(E' + E) + id 
-(E) + id 
-E' + id 
E' + id 
E' + E' 

E' + E 
E 



-(id + id) + id 
-(E' + id) + id 
-(E' + E'j + id 

Cl -< E ' + E '> + E ' 

^ -(E' + E) + E' 
-(E) + E' 

-E' + E' 

E' + E’ 

E' + E 
E 

CM 



E E' | E' + E 
E'-» -E' | id | (E) 

-(id + id) + id 
-(id + id) + E' 

-(id + id) + E 
-(E' + idj + E 
-(E' + E'j + E 
-(E' + E) + E 
-(E) + E 
-E' + E 
E' + E 
E 
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Important 
Fact #1 



Important Fact #1 about bottom-up parsing: 



A bottom-up parser traces a rightmost derivation in reverse 
By using reductions instead of productions 
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Where Do 
Reductions 
Happen? 



Important Fact #1 has an interesting consequence: 

• Let aP<D be a step of a bottom-up parse 
Assume the next reduction is by p 

• Then co is a string of terminals 

Why? 

Because aXco — » aPco is a step in a right-most derivation 
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ATrivial 

Bottom-Up 

Parsing 

Algorithm 



Let I = input string 
repeat 

pick a non-empty substring p of I 

where p is a production 
if no such p, backtrack 
replace one p by X in I 

until I = U S" (the start symbol) or all possibilities are 
exhausted 
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Idea: Split string into two substrings 

Right substring is as yet unexamined by parsing 
(a string of terminals) 

Left substring has terminals and non-terminals 



Notation 



I • The dividing point is marked by a | 
The | is not part of the string 



Initially, all input is unexamined Ix^ . . . x n 
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Bottom-up parsing uses only two kinds of actions: 

Shift-Reduce Shift 

Parsing 

Reduce 
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Shift 



Shift: Move | one place to the right 
Shifts a terminal to the left string 



ABC|xyz =^> ABCx|yz 
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Reduce 


• Apply an inverse production at the right end of 
the left string 

• IfA^xy is a production, then 




Cbxy|ijk =^> CbA|ijk 
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The Example 


int * int | + int 


with 


int * | + int 


Reductions 




Only 


T + int | 




T + T | 



reduce T -> int 
reduce T -» int * T 

reduce T int 
reduce E T 
reduce E T + E 
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The 

Example 
with Shift- 
Reduce 
Parsing 



| int * int + int 
int | * int + int 
int | int + int 
int * int | + int 
int * | + int 

T | + int 
T + | int 
T + int | 

T+ T I 



T+ E | 



E 



shift 

shift 

shift 



reduce T -» int 
reduce T int * T 



shift 

shift 

reduce T int 
reduce E -> T 
reduce E -> T + E 
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int * int + int 



AShift- 
Reduce 
Parse in 
Detail (1) 




T 



int + int 
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|int * int + int 
int | * int + int 



A Shift- 
Reduce 
Parse in 
Detail (2) 




T 



int + int 
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| int * int + int 
int | * int + int 
int * | int + int 



AShift- 
Reduce 
Parse in 
Detail (3) 



int 



int + int 
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A Shift- 
Reduce 
Parse in 
Detail (4) 



| int * int + int 
int | * int + int 
int * | int + int 
int * int | + int 



int 



int 



int 
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AShift- 
Reduce 
Parse in 
Detail (5) 



|int * int + int 


int | 


* int + int 


int * 


| int + int 


int * 


int | + int 


int * 


T I + int 



int 



T 

* int + 

T 
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AShift- 
Reduce 
Parse in 
Detail (6) 



| int * int + int 
int | * int + int 
int * | int + int 
int * int | + int 
int * T | + int 
T | + int 



int 



T 

int + int 

t 
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AShift- 
Reduce 
Parse in 
Detail (7) 



|int 


* int + int 


int | 


* int + int 


int 


| int + int 


int * 


r int | + int 


int * 


' T | + int 


T | H 


i- int 


T + 


| int 



int 



T 

int + int 

t 
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A Shift- 
Reduce 
Parse in 
Detail (8) 



| int * int + int 
int | * int + int 
int * | int + int 
int * int | + int 
int * T | + int 
T | + int 
T + | int 
T + int I 



int 



T 

int + int 

t 
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AShift- 
Reduce 
Parse in 
Detail (9) 



|int * int + int 
int | * int + int 
int * | int + int 
int * int | + int 
int * T | + int 
T | + int 
T + | int 
T + int | 

T + T | / 



int 



T 

int + 



T 

int 

T 
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A Shift- 
Reduce 
Parse in 
Detail (10) 



| int * int + int 
int | * int + int 
int * | int + int 
int * int | + int 
int * T | + int 
T | + int 
T + | int 
T + int | 

T+ T | 

T+ E ‘ 



int 



T 

int + 



T 

int 

t 
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A Shift- 
Reduce 
Parse in 
Detail (11) 



| int * int + int 
int | * int + int 
int * | int + int 
int * int | + int 
int * T | + int 
T | + int 
T + | int 
T + int | 

T+ T | 

T+ E ‘ 

E| 



int 



T 




T 



+ 



int 



T 
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■nm. 

Note about the 
implementation 



The example just shows that there exists a sequence of 
shift and reduce moves that succeed in parsing the input 

However, it does not explain how we know whether to 
shift or reduce 



Dr. Sherin ElGokhy 



Quiz 

For the given grammar, what is the correct shift- 
reduce parse for the string: id + -id 

| id + -id 





| id + -id 




| E' + -id 




id |+ -id 




E'| + -id 




E'+|-id 


| id + -id 


E'+|-id 


\ 


E' + -|id 


id | + -id 


E' + -|id 


E' + -id | 
E' + -E' | 


id+|-id 
id + -| id 


CLE'+-|E' 

* E'+l-E' 




E' + E'| 


id + -id | 


E'+E' 




E' + E| 


id + -E'| 


E'+E 




E| 


id + E'| 


E'|+E 




id + E| 


|E' + E 






E' + E| 


|E 



0< e| 04 



E -> E' 
E'^-E ; 

| id + -id 
id | + -id 
E' | + -id 
E'+|-id 
E' + - 1 id 
E' + -id | 

E' + -E'| 

E' + E'| 

E' + E| 

E 



E' + E 
I id I (E) 
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In a given state, more than one action (shift or reduce) 
may lead to a valid parse 



• If it is legal to shift or reduce, there is a shift-reduce 
conflict 

Conflicts 

If it is legal to reduce by two different productions, there 
is a reduce-reduce conflict 
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Bottom-up parsing uses only two kinds of actions: 



Summary: 

Shift-Reduce 

Parsing 



Shift 

Read one input token and move the vertical bar one place 

to the right 

ABC|xyz =^> ABCx|yz 



Reduce 

Replace the right hand side of the production ...the 
sequence to the left of the vertical bar..... by the left hand 

side of the production 

• Cbxy|ijk =>CbA|ijk 
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The Stack 



Left string can be implemented by a stack 
• Top of the stack is the | 

Shift pushes a terminal on the stack 
• Reduce 

pops o or more symbols off of the stack 

• production rhs 

pushes a non-terminal on the stack 

• production Ihs 
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Key Issue 



How do we decide when to shift or reduce? 
Example grammar: 

E — »T + E | T 
T — » int *T | int | (E) 

• Consider step int | * int + int 

We could reduce by F —» int giving ' | * int + int 
A fatal mistake! 

No way to reduce to the start symbol 
There is no production starts with T* 
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Intuition: Want to reduce only if the result can still be 
reduced to the start symbol 

We do not want to reduce just because we have the 
right-hand side of a production on top of the stack 

Handles 

[ • Assume a rightmost derivation 

S — »* aXco — » aPco 

• Then X — » p in the position after a is a handle of aPco 
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Handles 

(Cont.) 



Handles formalize the intuition 

A handle is a string that can be reduced and also 
allows further reductions back to the start symbol 

(using a particular production at a specific spot) 



We only want to reduce at handles 



Note: We have said what a handle is, not how to find 
handles 
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Important 
Fact #2 



Important Fact #2 about bottom-up parsing: 



In shift-reduce parsing, handles appear only at the top of 

the stack, never inside 
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True initially, stack is empty 



Immediately after reducing a handle 

• right-most non-terminal on top of the stack 

next handle must be to right of right-most non- 
terminal, because this is a right-most derivation 

Sequence of shift moves reaches next handle 



Dr. Sherin ElGokhy 



Summary 
of Handles 



In shift-reduce parsing, handles always appear at the 
top of the stack 



Handles are never to the left of the rightmost non- 
terminal 

Therefore, shift-reduce moves are sufficient; the | never 
move left 



Bottom-up parsing algorithms are based on recognizing 
handles 
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Quiz 



Given the grammar at right, identify the 
handle for the following shift-reduce parse 
state: E' + -id | + -(id + id) 



E -» E' | E' + E 
E'— » -E' | id | (E) 



0 E ; + -id 
0 id 
0 -id 
0 E ; + -E' 
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Recognizing 

Handles 



There are no known efficient algorithms to recognize 
handles 

Solution: use heuristics to guess which stacks are handles 

On some CFGs, the heuristics always guess correctly 

Forthe heuristics we use here, these are the SLR grammars 
Other heuristics work for other grammars 
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Viable 

Prefixes 



It is not obvious how to detect handles 



At each step the parser sees only the stack, not the 
entire input; start with that . . . 



a is a viable prefix if there is an co such that a|co is a state 

of a shift-reduce parser 
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What does 
this mean? 



A viable prefix does not extend past the right end of the 
handle 

It's a viable prefix because it is a prefix of the handle 

As long as a parser has viable prefixes on the stack no 
parsing error has been detected 
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Important 
Fact #3 



Important Fact #3 about bottom-up parsing: 



For any grammar, the set of viable prefixes is a regular 

language 
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Important 
Fact #3 
(Cont.) 



Important Fact #3 is non-obvious 



We want to show how to compute automata that accept 
viable prefixes 
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An item is a production with a " 



The items for ' — > (E) are 
T — » .(E) 

T — > (E) 

T — > (E.) 

T — > (E). 



" somewhere on the rhs 
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Items 

(Cont.) 



The only item forX^ c isX— » . 



Items are often called TR(o) items" 
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Intuition 



The problem in recognizing viable prefixes is that the 
stack has only pieces of the rhs of productions 
If it had a complete rhs, we could reduce 

These pieces are always prefixes of rhs of productions 
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Example 



Considerthe input (int) 

• Then (E|) is a state of a shift-reduce parse 

• (E is a prefix of the rhs ofT — » (E) 

• Will be reduced after the next shift 

• Item T — » (E.) says that so far we have seen (E of 
this production and hope to see ) before we can 
perform a reduction 
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An Example 



Considerthe string (int * int): 

(int *|int) is a state of a shift-reduce parse 

"(" is a prefix of the rhs of -» (E) 

u s" is a prefix of the rhs of E 

"int *" is a prefix of the rhs of f — » int * T 
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An Example 
(Cont.) 



The "stack of items" 

T — » (-E) 

E^.T 

int * .T 

Says 

We've seen u (" of T -> (E) 

We've seen s of E 

We've seen int * ofT — » int *T 
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Recoqnizinq 

Viable 

Prefixes 



Idea: To recognize viable prefixes, we must 

Recognize a sequence of partial rhs's of 
productions, where 

Each sequence can eventually reduce to part of 
the missing suffix of its predecessor 
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I hanks 



